168 research outputs found
Lifelong Federated Reinforcement Learning: A Learning Architecture for Navigation in Cloud Robotic Systems
This paper was motivated by the problem of how to make robots fuse and
transfer their experience so that they can effectively use prior knowledge and
quickly adapt to new environments. To address the problem, we present a
learning architecture for navigation in cloud robotic systems: Lifelong
Federated Reinforcement Learning (LFRL). In the work, We propose a knowledge
fusion algorithm for upgrading a shared model deployed on the cloud. Then,
effective transfer learning methods in LFRL are introduced. LFRL is consistent
with human cognitive science and fits well in cloud robotic systems.
Experiments show that LFRL greatly improves the efficiency of reinforcement
learning for robot navigation. The cloud robotic system deployment also shows
that LFRL is capable of fusing prior knowledge. In addition, we release a cloud
robotic navigation-learning website based on LFRL
Agricultural Robot for Intelligent Detection of Pyralidae Insects
The Pyralidae insects are one of the main pests in economic crops. However, the manual detection and identification of Pyralidae insects are labor intensive and inefficient, and subjective factors can influence recognition accuracy. To address these shortcomings, an insect monitoring robot and a new method to recognize the Pyralidae insects are presented in this chapter. Firstly, the robot gets images by performing a fixed action and detects whether there are Pyralidae insects in the images. The recognition method obtains the total probability image by using reverse mapping of histogram and multi-template images, and then image contour can be extracted quickly and accurately by using constraint Otsu. Finally, according to the Hu moment characters, perimeter, and area characters, the contours can be filtrated, and recognition results with triangle mark can be obtained. According to the recognition results, the speed of the robot car and mechanical arm can be adjusted adaptively. The theoretical analysis and experimental results show that the proposed scheme has high timeliness and high recognition accuracy in the natural planting scene
Neural Proximal/Trust Region Policy Optimization Attains Globally Optimal Policy
Proximal policy optimization and trust region policy optimization (PPO and
TRPO) with actor and critic parametrized by neural networks achieve significant
empirical success in deep reinforcement learning. However, due to nonconvexity,
the global convergence of PPO and TRPO remains less understood, which separates
theory from practice. In this paper, we prove that a variant of PPO and TRPO
equipped with overparametrized neural networks converges to the globally
optimal policy at a sublinear rate. The key to our analysis is the global
convergence of infinite-dimensional mirror descent under a notion of one-point
monotonicity, where the gradient and iterate are instantiated by neural
networks. In particular, the desirable representation power and optimization
geometry induced by the overparametrization of such neural networks allow them
to accurately approximate the infinite-dimensional gradient and iterate.Comment: A short versio
Model-Based Reparameterization Policy Gradient Methods: Theory and Practical Algorithms
ReParameterization (RP) Policy Gradient Methods (PGMs) have been widely
adopted for continuous control tasks in robotics and computer graphics.
However, recent studies have revealed that, when applied to long-term
reinforcement learning problems, model-based RP PGMs may experience chaotic and
non-smooth optimization landscapes with exploding gradient variance, which
leads to slow convergence. This is in contrast to the conventional belief that
reparameterization methods have low gradient estimation variance in problems
such as training deep generative models. To comprehend this phenomenon, we
conduct a theoretical examination of model-based RP PGMs and search for
solutions to the optimization difficulties. Specifically, we analyze the
convergence of the model-based RP PGMs and pinpoint the smoothness of function
approximators as a major factor that affects the quality of gradient
estimation. Based on our analysis, we propose a spectral normalization method
to mitigate the exploding variance issue caused by long model unrolls. Our
experimental results demonstrate that proper normalization significantly
reduces the gradient variance of model-based RP PGMs. As a result, the
performance of the proposed method is comparable or superior to other gradient
estimators, such as the Likelihood Ratio (LR) gradient estimator. Our code is
available at https://github.com/agentification/RP_PGM.Comment: Published at NeurIPS 202
p38MAPK plays a pivotal role in the development of acute respiratory distress syndrome
Acute respiratory distress syndrome (ARDS) is a life-threatening illness characterized by a complex pathophysiology, involving not only the respiratory system but also nonpulmonary distal organs. Although advances in the management of ARDS have led to a distinct improvement in ARDS-related mortality, ARDS is still a lifethreatening respiratory condition with long-term consequences. A better understanding of the pathophysiology of this condition will allow us to create a personalized treatment strategy for improving clinical outcomes. In this article, we present a general overview p38 mitogen-activated protein kinase (p38MAPK) and recent advances in understanding its functions. We consider the potential of the pharmacological targeting of p38MAPK pathways to treat ARDS
- …